On the Importance of Pre-emphasis and Window Shape in Phase-Based Speech Recognition

نویسندگان

  • Erfan Loweimi
  • Seyed Mohammad Ahadi
  • Thomas Drugman
  • Samira Loveymi
چکیده

This paper aims at investigating the potentials of the phase spectrum in automatic speech recognition (ASR). We show that speech phase spectrum could potentially provide features with high discriminability and robustness. Out of such belief and to realize a higher portion of the phase spectrum potentials, we propose two simple amendments in two common blocks in feature extraction, namely pre-emphasis and windowing, without changing the workflow of the algorithms. Recognition tests over Aurora 2 indicate up to 11.2% and 14.7% performance improvement in average in the presence of both additive and convolutional noises for phase-based MODGDF and CGDF features, respectively. It proves the high potentials of the phase spectrum in robust ASR.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....

متن کامل

Disguised Face Recognition by Using Local Phase Quantization and Singular Value Decomposition

Disguised face recognition is a major challenge in the field of face recognition which has been taken less attention. Therefore, in this paper a disguised face recognition algorithm based on Local Phase Quantization (LPQ) method and Singular Value Decomposition (SVD) is presented which deals with two main challenges. The first challenge is when an individual intentionally alters the appearance ...

متن کامل

Search Space Reduction for Farsi Printed Subwords Recognition by Position of the Points and Signs

In the field of the words recognition, three approaches of words isolation, the overall shape and combination of them are used. Most optical recognition methods recognize the word based on break the word into its letters and then recogniz them. This approach is faced some problems because of the letters isolation dificulties and its recognition accurcy in texts with a low image quality. Therefo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013